11  Time Series Decomposition

CautionStill under construction

This section is still under construction and will be completed in the near future. Please do not go beyond this point for now.

Time series data often exhibit patterns such as trends, seasonality, and cycles. Decomposing a time series into its components can help to better understand the underlying processes.

Time series are usually decomposed into three components: - Trend: \(T_t\), long-term progression of the series (e.g., increasing or decreasing). - Seasonality: \(S_t\), regular patterns that repeat over a fixed period (e.g., daily, weekly, yearly). - Residual: \(I_t\), random noise or irregular component.

Additive models assume that the components add together to form the time series: \[ Y_t = T_t + S_t + I_t \]

Multiplicative models assume that the components multiply together to form the time series: \[ Y_t = T_t \times S_t \times I_t \]

Note

Sometimes, also a fourth component, Cyclic (\(C_t\)), is considered, which represents long-term oscillations that are not of fixed period.

11.1 Examples

Imports

This example uses the package statsmodels for time series decomposition.

import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objs as go
import plotly.io as pio
import plotly.subplots as sp

from statsmodels.tsa.seasonal import seasonal_decompose, STL  # used for decomposition


pio.renderers.default = "notebook"  # set the default plotly renderer to "notebook" (necessary for quarto to render the plots)

Generating Example Data

np.random.seed(42)  # for reproducibility

date_range = pd.date_range(start="2010-01-01", end="2024-12-31", freq="MS")

trend = 100 + 0.9 * np.arange(len(date_range))                      # nonlinear trend
seasonality = 121 * np.sin(np.pi * (date_range.month / 12))         # yearly seasonality

noise = np.random.normal(0, 21, len(date_range))                    # random noise at each point

# Add outliers with a probability of 0.05
outlier_mask = np.random.rand(len(date_range)) < 0.05
outlier_values = np.random.normal(10, 100, len(date_range))  # large outliers
noise[outlier_mask] += outlier_values[outlier_mask]

sales = trend + seasonality + noise                                 # final data as sum of all components

df = pd.DataFrame({"Date": date_range, "Sales": sales}).set_index("Date")
fig = px.line(df, y="Sales", title="Sales Over Time")
fig.show()

In the above plot, we can see a clear upward trend, but identifying seasonality is more difficult.

To support this visual analysis, we add vertical lines at the beginning of each year.

fig = px.line(df, y="Sales", title="Sales Over Time")

for year in range(df.index.year.min(), df.index.year.max() + 1):
    fig.add_vline(x=pd.Timestamp(f"{year}-01-01"), line_dash="dash", line_color="gray", opacity=0.5)

fig.show()

Now it is apparent that there is a yearly seasonal pattern, with peaks around mid-year.

Algorithmic decomposition of a time series into its components is not a trivial task. Many algorithms exist to achieve this, each with its own advantages and disadvantages.

In the following, we use a very simple additive model based on moving averages to decompose our time series. First, the trend is estimated using a moving average with a smaller window size. Then, the detrended series is calculated by subtracting the trend from the original series. The seasonal component is estimated by averaging the values for each season (e.g., each month, year, ..) in the detrended time series. Finally, the residual component is calculated by subtracting both the trend and seasonal components from the original series.

Note that this is a very naive approach and according to the documentation of seasonal_decompose a more sophisticated method should be preferred. However, it is easy to understand and serves well for demonstration purposes.

decomposed_ts = seasonal_decompose(df, model='additive')
decomposed_ts.plot();  # the object has a built-in plot method (based on matplotlib)

Here is a helper function to create a plotly figure with four plots.

def seasonal_decompose_plotly(decomposed_ts, title):
    """
    Plots a time series decomposition using Plotly.
    
    Parameters:
    decomposed_ts (DecomposeResult): Decomposed time series result.

    Returns:
    plotly object
    """

    # Extract components
    observed = decomposed_ts.observed
    trend = decomposed_ts.trend
    seasonal = decomposed_ts.seasonal
    resid = decomposed_ts.resid

    # Create subplots
    fig_decomp = sp.make_subplots(rows=4, cols=1, shared_xaxes=True,
                                subplot_titles=["Observed", "Trend", "Seasonal", "Residual"])

    fig_decomp.add_trace(go.Scatter(x=observed.index, y=observed.values.flatten(), name="Observed"), row=1, col=1)
    fig_decomp.add_trace(go.Scatter(x=trend.index, y=trend, name="Trend"), row=2, col=1)
    fig_decomp.add_trace(go.Scatter(x=seasonal.index, y=seasonal, name="Seasonal"), row=3, col=1)
    fig_decomp.add_trace(go.Scatter(x=resid.index, y=resid, name="Residual"), row=4, col=1)

    fig_decomp.update_layout(height=900, title_text=title)
    
    return fig_decomp

Please note the different \(y\)-axes scales in the following plot.

seasonal_decompose_plotly(decomposed_ts, title="Seasonal Decomposition of Time Series (Additive)")

A more robust method for time series decomposition is STL (Seasonal and Trend decomposition using Loess).

It not only allows for more flexibility in the decomposition process (e.g. look at seasonal changes over time), but is also more robust to outliers.

decomp_stl = STL(df, robust=True).fit()  # robust=True to be less sensitive to expected outliers
seasonal_decompose_plotly(decomp_stl, title="STL Decomposition")

Having the decomposed components at hand, opens a variety of applications.

11.1.1 Denoising

Using the trend and seasonal components from the STL decomposition, we can create a denoised version of the original time series.

decomp_stl_estimated = decomp_stl.seasonal + decomp_stl.trend
fig = px.line(decomp_stl_estimated, title="Estimated versus Observed Sales", labels={"value": "Sales", "index": "Date"})
fig.update_traces(name="Estimated", showlegend=True)

fig.add_trace(go.Scatter(x=df.index, y=df["Sales"], mode='lines', name="Observed"))
fig.show()

In previous plot, we can see that the estimated values follow the observed values quite closely, indicating that the STL decomposition has effectively captured the underlying patterns in the data. But we can also see some points with larger deviations, which are likely outliers in the original data.

11.1.2 Anomaly Detection

One way to quantify outliers is to apply a threshold to the residuals (e.g., 3 times the standard deviation of the residuals, also known as the 3-sigma rule).

resid = decomp_stl.resid
std_resid = resid.std()

fig = go.Figure()
fig.add_trace(go.Scatter(x=resid.index, y=resid, mode='lines', name='Residual'))
# Add horizontal lines for ±3 std. dev.
fig.add_hline(y=resid.mean() + 3 * std_resid, line_dash="dash", line_color="blue", opacity=0.7, name="+3 Std. Dev.")
fig.add_hline(y=resid.mean() - 3 * std_resid, line_dash="dash", line_color="blue", opacity=0.7, name="-3 Std. Dev.")

# Fill area between the lines
fig.add_traces([
    go.Scatter(
        x=resid.index, 
        y=[resid.mean() + 3 * std_resid] * len(resid), 
        mode='lines',
        line=dict(width=0),
        showlegend=False,
        hoverinfo='skip'
    ),
    go.Scatter(
        x=resid.index, 
        y=[resid.mean() - 3 * std_resid] * len(resid), 
        mode='lines',
        fill='tonexty',
        fillcolor='rgba(0,0,255,0.08)',
        line=dict(width=0),
        showlegend=False,
        hoverinfo='skip'
    )
])

fig.update_layout(title="Residual with ±3 Std. Dev. Area", yaxis_title="Residual")
fig.show()

So the outliers are:

outliers = df[(resid > resid.mean() + 3 * std_resid) | (resid < resid.mean() - 3 * std_resid)]
outliers
Sales
Date
2014-09-01 447.316646
2016-07-01 139.561235
2018-03-01 489.942096
2021-06-01 227.748919
fig = px.line(df, y="Sales", title="Sales with Outliers Highlighted", labels={"value": "Sales", "index": "Date"})
fig.add_scatter(
    x=outliers.index,
    y=outliers["Sales"],
    mode="markers",
    marker=dict(color="red", size=10, symbol="x"),
    name="Outliers"
)
fig.show()

11.1.3 Detrending

Using the decomposed components also allows for detrending the time series, which can be useful for further analysis.

# Detrend the observed series by subtracting the trend component
detrended = decomp_stl.observed["Sales"] - decomp_stl.trend

fig = px.line(detrended, title="Detrended Sales Over Time", labels={"value": "Detrended Sales", "index": "Date"})
fig.show()

11.1.4 Deseasonalizing

Another important application is to deseasonalize the time series, which is often a prerequisite for forecasting models.

# Deseasonalize the observed series by subtracting the seasonal component
deseasonalized = decomp_stl.observed["Sales"] - decomp_stl.seasonal

fig = px.line(deseasonalized, title="Deseasonalized Sales Over Time", labels={"value": "Deseasonalized Sales", "index": "Date"})
fig.show()